Mosaic protein and restriction endonuclease assisted ligation method for making the same

ABSTRACT

A mosaic protein comprising a variety of immunoreactive antigenic epitopes from several genotypes of hepatitis C virus. The mosaic protein provides a sensitive and specific immunological hepatitis detection assay. A restriction enzyme assisted ligation method of making an artificial gene permits controlled construction of mosaic proteins, and allows confirmatory expression of the intermediate gene products.

FIELD OF THE INVENTION

The present invention relates to the field of recombinant production of proteins from synthetic genes. In particular, the invention relates to the expression of mosaic proteins constructed from antigenic peptides of the hepatitis C virus.

BACKGROUND OF THE INVENTION

The hepatitis C virus (HCV) is classified as part of the Flaviviridae family, and contains a single, positive strand of RNA approximately 9400 nucleotides long, encoding for at least a 3000 amino acid polyprotein, depending on the source of the viral isolate. (Choo Q-L., et al., Proc. Natl. Acad. Sci. USA 88:2451-2455, 1991; Choo Q-L., et al., Science 244:359-362, 1989; Kato N., et al., Proc. Natl. Acad. Sci. USA 87:9524-9528, 1990; Takamizawa A., et al., J. Virol. 65:1105-1113, 1991) The 5'-end of the genome encodes for the structural proteins that include the nucleocapsid protein (C), and two envelope proteins (E1, and E2/NS1), whereas the 3'-end of the genome encodes for the non-structural proteins that include the NS2, NS3, NS4, and NS5 proteins. (Miller R. H. and Purcell R. H., Proc. Natl. Acad. Sci. USA 87:2057-2061, 1990; Takeuchi K., et al., J. Gen. Virol. 71:3027-3033, 1990; Choo Q-L., et al., Proc. Natl. Acad. Sci. USA 88:2451-2455, 1991; Hijikata M., et al., Proc. Natl. Acad. Sci. USA 88:5547-5551, 1991; Takamizawa A., et al., J. Virol. 65:1105-1113, 1991; Houghton M., et al., Hepatology 14:381-388, 1991) The hepatitis C virus is the main causative agent of non-A, non-B hepatitis, and plays a major role in the development of chronic liver disease, liver cirrhosis, and hepatocellular carcinoma worldwide. (Choo Q-L, et al., Proc. Natl. Acad. Sci. USA 88:2451-2455, 1991) Since there is no vaccine and no effective therapy for HCV induced disease, diagnosis and prevention of infection are issues of major public health importance.

In an effort to improve the efficiency of HCV diagnosis, many antigenic regions have been identified along the HCV genome, and used to develop three generations of enzyme immunoassays for the detection of anti-HCV activity in human sera. (Kuo et al., Science 244:362-364, 1989; Chien D. Y., et al., Proc. Natl. Acad. Sci. USA 89:10011-10015, 1992) Each successive generation represented an improvement in both the sensitivity and specificity of the enzyme immunoassay (EIA) by adding more antigenic regions to the assay. The first generation of enzyme immunoassays relied on the detection of antibodies to a region within a non-structural protein, 5-1-1. Second and third generation assays were based on the detection of antibodies against the recombinant 22 kDa core or nucleocapsid (NC) protein and several recombinant proteins derived from non-structural regions of the viral polyprotein (NS3, NS4, NS5).

Although the improvements in the specificity and sensitivity of the EIAs have resulted in a reduction in the number of new HCV infections (Alter H. J., Blood 85(7):1681-1695, 1995), many investigators have indicated that the current versions still require further development. (Tobler L. H., et al., Transfusion 34:130-134, 1994; Courouce A. M., et al., Transfusion 34:790-795, 1994; Damen M., et al., Transfusion 35:745-749, 1995; Feucht H. H., et al., J. Med Virol. 48:184-190, 1995; Bar-Shany S., et al., Inetrnl. J. Epi. 25:674-677, 1996; Dhaliwal S. K., et al., J. Med. Virol. 48:184-190, 1996; Pawlotsky J. M., et al., J. Clin. Micro.

Jan:80-83, 1996) The impetus to improve tests for detection of anti-HCV is based upon studies demonstrating that currently available EIAs have relatively poor specificities, especially in low-prevalence populations. (Alter H. J., Blood 85(7):1681-1695, 1995; Feucht H. H., et al., J. Med. Virol. 48:184-190, 1995) Additionally, even after the development of supplemental tests, such as MATRIX (Abbott Laboratories, Abbott Park, Ill.), used to confirm EIA positive sera, 10% of specimens are still classified as indeterminate (reactive to a single antigen) following supplemental testing. (Pawlotsky J. M., et al., J. Clin. Micro. Jan:80-83, 1996) These findings might be due to testing sera during the very early stage of infection before all antibodies reach detectable levels. Alternatively, reactivity to a single antigen may be due to non-specificity of the specimen.

Another important limitation to currently available assays is the use of genotype 1 recombinant proteins as immunologic targets. Recently, it was reported that there are differences in the serologic reactivity of the current EIAs to the different HCV genotypes. (Zein N. N., et al., Mayo Clinic Proc. 70(5):449-452, 1995; Dhaliwal S. K., et al., J. Med. Virol. 48:184-190, 1996) This observation suggests that the current EIAs, which are based on type I HCV, may need to be further improved by including antigenic epitopes from different genotypes. Therefore, it is clear that there remains a strong need for an enzyme immunoassay with increased specificity and sensitivity that would react with sera infected with multiple genotypes of the hepatitis C virus.

To create an enzyme immunoassay with broad reactivity to multiple genotypes, a synthetic protein must be assembled from a long DNA fragment containing multiple antigenic epitopes. The synthesis of long artificial DNA polynucleotides has been made possible by the availability of highly efficient methods to chemically synthesize relatively short oligonucleotides. To assemble a gene from oligonucleotides, several enzymatic reactions using polymerases and/or ligases may be used. Two methods described elsewhere (Khudyakov Y. E., et al., Nucleic Acid Res. 21(11):2747-2754, 1993; Khudyakov Y. E., et al., J. Virol. 68:(11)7067-7074, 1994; and U.S. Pat. No. 5,563,032), the polymerase chain reaction (PCR) and the Exchangeable Template Reaction (ETR), have been successfully applied to assemble synthetic genes from oligonucleotides. The use of PCR, however, is disadvantageous in cases where repeated sequences are designed in the gene, while ETR can not be used to conveniently express short fragments of the synthetic gene. Therefore, a new method of assembling synthetic genes with repeated sequences that would allow for the expression of shorter fragments of the gene, would greatly facilitate the creation of a synthetic protein to be used in an improved enzyme immunoassay, in particular for the detection of HCV.

SUMMARY OF THE INVENTION

The present invention relates to methods and compositions used to improve the sensitivity, the spectrum of immunoreactivity, and the specificity of antigens used as immunologic targets for detection. In preferred embodiments, the detection can be performed by enzyme immunoassay (EIA). The method, designated Restriction Endonuclease Assisted Ligation (REAL), involves the construction of an artificial gene from synthetic oligonucleotides. The compositions are synthetic proteins composed of a mosaic of broadly immunoreactive antigenic epitopes from several genotypes of, for example HCV.

REAL employs the use of the Klenow fragment of DNA Polymerase I to convert specially designed complimentary oligonucleotides into double stranded DNA fragments, which are subsequently amplified by PCR. Restriction sites were engineered into the cloning vector and used to produce complimentary overhangs for the addition of consecutive fragments. Each fragment may be cloned and expressed individually, for example in Escherichia coli to determine their immunoreactivity or may be assembled into full length product without cloning. Two consecutive fragments are subsequently ligated, amplified by PCR, cleaved with restriction endonucleases, and ligated with DNA ligase to assemble each fragment into a longer fragment in a consecutive process. By repeating this process fragments of increasing length are assembled, expressed and analyzed for immunoreactivity, and reiterated until the full length gene is assembled.

The invention provides mosaic proteins comprising a plurality of homologous antigenic peptides from different genotypes of a hepatitis virus. In particular, the invention provides mosaic proteins comprising a plurality of homologous antigenic nucleocapsid peptides from different genotypes of a hepatitis C virus. Further, the invention provides mosaic proteins comprising a plurality of homologous antigenic non-structural peptides from different genotypes of a hepatitis C virus. The mosaic proteins and genes encoding therefor can be used for immunologic detection or vaccination.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a schematic representation of the strategy used to insert each monomer into the cloning vector. Monomers were cloned into EcoRI and BamHI treated vector pCV3 by treatment of annealed oligonucleotides with the Klenow fragment of DNA Polymerase I followed by restriction endonuclease cleavage.

FIG. 2 is a schematic representation of the pCV3 vector design, and demonstrates that the pGEX-4T-2 vector (SEQ ID NO:53) was treated with BamHI and NotI to remove the multiple cloning site. The recovered cleaved vector was then treated with Klenow fragment of DNA polymerase I in the presence of GTP and ATP to prevent self annealing of plasmid.

FIG. 3 shows the ligation of a new multiple cloning site sequence (SEQ ID NO:54) into cloning vector by DNA ligase to create the vector pCV3 (SEQ ID NO:55).

FIG. 4 is a schematic representation of the design of the cloned segment. The bar represents a specific immunoreactive epitope from an appropriate etiologic agent.

FIG. 5 depicts the cloning strategy of each monomer. The fragments were treated with EcoRI and BamHI followed by ligation into similarly treated pCV3.

FIG. 6 is a schematic representation of the PCR amplicons containing BbvI, EcoRI, BamHI, and FokI restriction sites, and the subsequent restriction of the first monomer with the FokI restriction enzyme.

FIG. 7 shows the restriction of a second consecutive monomer with the BbvI restriction enzyme.

FIG. 8 demonstrates the ligation of a FokI treated first monomer with a BbvI treated second monomer with DNA ligase to create a dimer.

FIG. 9 show s the amino acid sequence of each monomer comprising the NC mosaic protein. The bolded amino acids indicate the variations between the sequences of each monomer.

FIG. 10 depicts the strategy of assembly from individual NC monomers by REAL resulting in a full length gene composed of monomers ABCDEFGHIJK. The primary sequence of all cloned fragments were confirmed prior to assembly by REAL.

FIG. 11 shows the analysis of selected expressed and purified protein by immunoblot analysis. A single anti-HCV positive human sera with high anti-NC activity by MATRIX was used as the source of primary antibody.

FIG. 12 shows the frequency distribution of 200 anti-HCV negative sera by NC Mosaic EIA for the detection of anti-NC activity. The cutoff value was statistically derived as the mean optical density (OD) value of negative sera plus 3.5 times the standard deviation of the mean. This value equals 0.145.

FIG. 13 shows an endpoint titration of specimen no. BBI 304 by NC Mosaic EIA and by MATRIX for the detection of anti-NC activity.

FIG. 14 shows an endpoint titration of specimen no. BB1325 by NC Mosaic EIA and by MATRIX for the detection of anti-NC activity.

FIG. 15 depicts a comparison of the NC Mosaic EIA (OD y-axis) and MATRIX (S/C y-axis) for the detection of anti-NC activity in seroconversion no. 4812.

FIG. 16 depicts a comparison of the NC Mosaic EIA (OD y-axis) and MATRIX (S/C y-axis) for the detection of anti-NC activity in seroconversion no. 4813.

FIG. 17 depicts a comparison of the NC Mosaic EIA (OD y-axis) and MATRIX (S/C y-axis) for the detection of anti-NC activity in seroconversion no. 4814.

FIG. 18 shows the amino acid sequence of the NS4 mosaic protein. A total of 17 sequences were used corresponding from the same domains from different subtypes and genotypes. Domain 1691-1710 corresponds to the N-terminus of 5-1-1, while domain 1712-1733 corresponds to the C-terminus of 5-1-1. Domain 1921-1940 corresponds to region 59 (C-terminus of NS4).

FIG. 19 demonstrates the strategy of NS4 gene assembly by Restriction Endonuclease Assisted Ligation (REAL). The mosaic NS4 gene was assembled from 9 consecutive monomers (A, B, C, D, E, F, G, H, and I). The monomers were first assembled into 4 dimers (AB, CD, EF, and GH), and then 4 dimers were further assembled into 2 tetramers (ABCD and EFGH). The full length gene was constructed by the ligation of 2 tetramers and 1 monomer (I).

FIG. 20 shows the analysis of expressed and affinity-chromatography purified proteins of all monomers (A, B, C, D, E, F, G, H and I) , dimers (AB, CD, EG, and GH), tetramers (ABCD, and EFGH), and the full length gene (A-I) by immunoblot using a single anti-NS4 positive specimen diluted 1:200. Three monomers (A, H, and I) were not immunoreactive even when the specimen was diluted 1:10.

FIG. 21 depicts the frequency distribution of 160 anti-HCV negative sera and 166 anti-HCV positive sera by NS4 Mosaic EIA.

FIGS. 22A-B shows the endpoint titration of specimen no. I (FIG. 22A) and no. 2 (FIG. 22B) by the NS4 Mosaic EIA and by MATRIX.

FIG. 23 shows a comparison of the percent positively for anti-NS3, anti-NS4, and anti-NC activity by MATRIX versus anti-NS4 activity by NS4 Mosaic EIA using 182 anti-HCV positive sera.

FIGS. 24A-D demonstrate the sensitivity of the NS4 Mosaic EIA versus MATRIX for the detection of anti-NS4 activity in seroconversion panel nos. 4811 (FIG. 24A), 4812 (FIG. 24B), 6214 (FIG. 24C) and 4813 (FIG. 24D).

FIG. 25 shows the reactivity of genotype specific sera by NS4 Mosaic EIA for anti-NS4 activity.

DETAILED DESCRIPTION

The present invention relates to methods and compositions used to improve the sensitivity, the spectrum of immunoreactivity, and the specificity of antigens used as immunologic targets for detection. In preferred embodiments, the detection can be performed by enzyme immunoassay (EIA). The method, designated Restriction Endonuclease Assisted Ligation (REAL), involves the construction of an artificial gene from synthetic oligonucleotides. The compositions are synthetic proteins composed of a mosaic of broadly immunoreactive antigenic epitopes from several genotypes of a species, for example hepatitis C virus (HCV). The mosaic protein compositions can be used for immunologic detection of, or vaccination against, the organisms from which they are derived. The invention further contemplates that the nucleic acids encoding the mosaic proteins can be used for immunologic detection of, or vaccination against, the organisms from which they are derived. Therefore, in addition to compositions and methods for detecting hepatitis, the invention provides a hepatitis vaccine comprising a mosaic protein, or a gene encoding therefor.

REAL employs the use of the Klenow fragment of DNA Polymerase I to convert specially designed complimentary oligonucleotides into double stranded DNA fragments, which are subsequently amplified by PCR. Restriction sites were engineered into the cloning vector and used to produce complimentary overhangs for the addition of consecutive fragments. Each fragment may be cloned and expressed individually, for example in Escherichia coli to determine their immunoreactivity or may be assembled into full length product without cloning. Two consecutive fragments are subsequently ligated, amplified by PCR, cleaved with restriction endonucleases, and ligated with DNA ligase to assemble each fragment into a longer fragment in a consecutive process. By repeating this process fragments of increasing length are assembled, expressed and analyzed for immunoreactivity, and reiterated until the full length gene is assembled.

In particular, the present invention provides a method of constructing an artificial gene, comprising synthesizing an initial oligonucleotide containing an initial gene segment encoding an initial gene product. The initial gene segment is flanked in the upstream direction (5') by an upstream initial ligating sequence, a first endonuclease recognition sequence that is recognized by a first endonuclease that cleaves at the first endonuclease recognition sequence, and a second endonuclease recognition sequence which is recognized by a second endonuclease that cleaves downstream of the first endonuclease recognition sequence and within the upstream initial ligating sequence.

Additionally, the initial gene segment is flanked in the downstream direction (3') by a downstream initial ligating sequence, a stop codon, a third endonuclease recognition sequence that is recognized by a third endonuclease that cleaves at the third endonuclease recognition sequence, and a fourth endonuclease recognition sequence which is recognized by a fourth endonuclease that cleaves upstream of the third endonuclease recognition sequence, upstream of the stop codon, and within the downstream initial ligating sequence. An example of such an initial oligonucleotide can be seen in FIG. 4, wherein "SEGMENT" designates the initial gene segment and "NNNNNN" designates the initial ligating sequence.

The method further comprises synthesizing a subsequent oligonucleotide containing a subsequent gene segment encoding a subsequent gene product. The subsequent gene segment is flanked in the upstream direction (5') by an upstream subsequent ligating sequence, a first endonuclease recognition sequence which is recognized by the first endonuclease that cleaves at the first endonuclease recognition sequence, and a second endonuclease recognition sequence which is recognized by the second endonuclease that cleaves downstream of the first endonuclease recognition sequence and within the upstream subsequent ligating sequence.

The subsequent gene segment is flanked in the downstream direction (3') by a downstream subsequent ligating sequence, a stop codon, a third endonuclease recognition sequence which is recognized by the third endonuclease that cleaves at the third endonuclease recognition sequence, and a fourth endonuclease recognition sequence which is recognized by the fourth endonuclease that cleaves upstream of the third endonuclease recognition sequence, upstream of the stop codon, and within the downstream subsequent ligating sequence. An example of such an subsequent oligonucleotide can be seen in FIG. 4, wherein "SEGMENT" designates the subsequent gene segment and "NNNNNN" designates the subsequent ligating sequence.

The method further comprises the step of cleaving the initial oligonucleotide with the fourth endonuclease, and cleaving the subsequent oligonucleotide with the second endonuclease. An example of this step is shown in FIGS. 6 and 7.

The method further comprises the step of ligating the initial oligonucleotide and the subsequent oligonucleotide together at the downstream initial ligating sequence of the initial oligonucleotide and the upstream subsequent ligating sequence of the subsequent oligonucleotide to form an artificial gene. An example of this step is shown in FIG. 8. The invention contemplated that additional subsequent oligonucleotides can be prepared, cleaved and ligated in a likewise fashion to make any artificial gene.

In a preferred embodiment, the invention provides the subsequent step of cleaving the artificial gene with the first and third endonucleases and inserting the remaining artificial gene into a vector previously cleaved with the first and third endonucleases. This step permits insertion of the final vector construct into a living organism, such as E. coli, and the recombinant expression of the artificial gene to produce the mosaic protein.

The invention provides the unique opportunity following the synthesizing the initial oligonucleotide step, and before the cleaving the initial oligonucleotide with the fourth endonuclease step, of confirming the operability of the initial oligonucleotide by cleaving the initial oligonucleotide with the first and third endonucleases and inserting the remaining initial oligonucleotide into a vector previously cleaved with the first and third endonucleases, and expressing the initial gene segment. An example of such an additional step is shown in FIG. 5, wherein "SEGMENT" designates the initial gene segment and "NNNNNN" designates the initial ligating sequence. This step is made possible by the inclusion of the stop codon downstream of the gene segment, which is removed by the addition of the fourth endonuclease in subsequent steps.

In preferred embodiments, the first and third endonucleases are EcoRI and BamHI, respectively. However, it will be understood that any endonucleases that cleave at the recognition sequence, can be used, with the proviso that two different endonucleases are employed. Examples of other suitable restriction endonucleases include: AflII, Alw44I, ApaI, ApaII, BclI, BglII, BspHI, BssHII, HindIII, KpnI, MluI, NarI, NcoI, PstI, SalI, or XhoI.

In preferred embodiments, the second and fourth endonucleases are BbvI and FokI, respectively. However, it will be understood that any endonuclease that cleaves downstream of the first endonuclease recognition sequence, or upstream of the third endonuclease recognition sequence, respectively, and within the ligating sequences, can be used, with the proviso that two different endonucleases are employed. Examples of other suitable restriction endonucleases that restrict the nucleic acid at a site away from the recognition site include: BspMI, HgaI, MboII, or SfaNI.

The invention provides that in preferred embodiments the initial and subsequent gene segments encode antigenic regions of a homologous protein from different genotypes of a hepatitis virus. The invention contemplates, however, that the REAL technique can be used for the construction of any mosaic or chimeric protein. In preferred embodiments, the gene segments encode antigenic regions of homologous proteins of different genotypes of a hepatitis C virus. Preferably, the gene segments encode antigenic regions of a nucleocapsid protein or a non-structural protein of different genotypes of a hepatitis C virus.

The invention further provides an artificial gene constructed by the above methods. The invention further provides a mosaic protein encoded by the artificial gene constructed by the above methods. The invention further provides a method of detecting a hepatitis infection in an individual comprising combining a serum sample from the individual with the mosaic protein made by the above methods, and detecting the presence of antibody binding to the mosaic protein, the presence of binding indicating a hepatitis infection in the individual. Preferably, an enzyme immunoassay (EIA) is performed for detection of a hepatitis infection, as is described in Examples 2 and 3. The detection of antibody binding can be facilitated by the use of detectable moieties, such as fluorescence, radioisotopes or solid substrate capture.

The invention provides mosaic proteins comprising a plurality of homologous antigenic peptides from different genotypes of a hepatitis virus. In particular, the invention provides mosaic proteins comprising a plurality of homologous antigenic nucleocapsid peptides from different genotypes of a hepatitis C virus. Further, the invention provides mosaic proteins comprising a plurality of homologous antigenic non-structural peptides from different genotypes of a hepatitis C virus. The invention also provides the gene sequences which encode for such mosaic proteins.

In one preferred embodiment, the mosaic protein can comprise a plurality of homologous antigenic nucleocapsid peptides from different genotypes of a hepatitis C virus as set forth in the amino acid sequences set forth in SEQ ID NOs:23-33, detailed in Example 2, herein. In another preferred embodiment, the mosaic protein can comprise a plurality of homologous antigenic non-structural peptides from different genotypes of a hepatitis C virus as set forth in the amino acid sequence of SEQ ID NO:52, detailed in Example 3, herein. It will be understood that certain minor or silent amino acid modifications and/or substitutions can be made in the amino acid sequences, while maintaining the antigenic functionality of the mosaic proteins. It will also be understood that certain silent or wobble nucleotide modifications and/or substitutions can be made in the gene sequences, while maintaining the ability of the gene to be ligated by the REAL technique and the antigenic functionality of the encoded mosaic proteins.

This invention is further illustrated by the following examples, which are not to be construed in any way as imposing limitations upon the scope thereof. On the contrary, it is to be clearly understood that resort may be had to various other embodiments, modifications, and equivalents thereof which, after reading the description herein, may suggest themselves to those skilled in the art without departing from the spirit of the present invention and/or the scope of the appended claims. The references mentioned in this specification are hereby incorporated by reference in their entireties.

EXAMPLE 1 Restriction Endonuclease Assisted Ligation

Each monomer, or DNA segment encoding a peptide of interest, was prepared from synthetic oligonucleotides of 40-80 nucleotides that overlapped at 8-10 nucleotides and contained an EcoRI site on the 5'-terminus of the plus strand and a BamHI site on the 5'-terminus of the negative strand. More detailed examples of such monomers can be seen in SEQ ID NOs:1-22 and 34-51 described in Examples 2 and 3 herein. Single strand DNA was converted to double stranded DNA by adding 50 pmol of each oligodeoxynucleotide, 2 μl BamHI buffer (Boehringer Mannheim, Indianapolis, Ind.) and 2.5 pmol dNTPs in a total volume of 20 μl. The mixture was heated to 95° C. for 1 minute and cooled on ice. After adding 1 μl of the Klenow fragment of DNA polymerase I (Promega, Madison, Wis.), the reaction mixture was incubated 30 minutes at room temperature. An incubation step at 65° C. for 10 minutes was used to inactivate the DNA polymerase. The resulting double stranded DNA molecule was cleaved with the restriction enzymes EcoRI and BamHI and ligated into pCV3 using the T4 DNA fast ligation kit (Boehringer Mannheim, Indianapolis, Ind.). (FIG. 1) All internal BbvI, EcoRI, BamHI, and FokI restriction endonuclease sites were previously removed from the synthetic gene by introducing synonymous modifications into codons.

The pCV3 vector was created by modification of the multiple cloning site in the expression vector pGEX-4T-2 (Pharmacia, Piscataway, N.J.). After cleaving the vector with BamHI and NotI to remove the internal EcoRI site, the recovered vector was treated with the Klenow fragment of DNA polymerase I (Promega, Madison, Wis.) in the presence of GTP and ATP to prevent self annealing of the restricted plasmid. (FIG. 2) Subsequently, a double-stranded DNA molecule was prepared by annealing two complimentary oligonucleotides containing one EcoRI site flanked by a BbvI site, and one BamHI site flanked by a FokI site. This DNA molecule was inserted into the multiple cloning site of the modified vector by DNA ligase. (FIG. 3) The final structure of pCV3 was confirmed by restriction enzyme analysis and by DNA sequencing.

After each monomer was inserted into the pCV3 vector, it was amplified by PCR using plasmid specific primers thereby acquiring BbvI and FokI sites and each structure was confirmed by restriction endonuclease analysis and DNA sequencing. FIG. 4 shows the elements for each cloned monomer or segment of the synthetic gene. The solid line separating each strand represents the coding sequence of each monomer followed by a stop codon so that each fragment may be individually expressed. The EcoRI and BamHI sites are used for cloning, while the BbvI and FokI sites are used to remove the EcoRI and BamHI sites and to produce overhangs complimentary to the next consecutive monomer.

The process of consecutive assembly of monomers or fragments into a synthetic gene is illustrated in FIGS. 5-8. The first monomer is restricted with EcoRI and BamHI and ligated into similarly treated pCV3. Following amplification by PCR each segment acquired the restriction sites BbvI and FokI (FIG. 4). Restriction with BbvI creates a 5'overhang 8 base pairs downstream of a GCAGC sequence, whereas restriction with FokI creates a 5'overhang 9 base pairs downstream of a GGATG sequence. Therefore, upon reversing the order of the FokI site on the anti-sense strand of the DNA to create a 3'overhang, and situating the restriction sites 8 and 9 base pairs, respectively, from 4-nucleotide complimentary sequences, each consecutive monomer was alternately treated with FokI and BbvI to produce complimentary overhangs, a 3'-overhang on the first segment and a 5'-overhang on the second segment (FIGS. 6 and 7). The restricted segments were separated by agarose gel electrophoresis using 25% low melting point agarose and stained with ethidium bromide. After staining with ethidium bromide and cutting out the bands with correct size, the DNA was extracted by using the Wizard PCR preps (Promega, Madison, Wis.) following the instructions of the manufacturer. Bands corresponding to the restricted segments were recovered by melting the agarose, and each segment was purified using the Wizard PCR Preps DNA Purification System (Promega, Madison, Wis.).

Subsequently, the first amplified segment was treated with FokI to produce a 5'-overhang complimentary to the 3'-overhang on the second monomer created by treatment with BbvI. Each of the treated segments were purified and ligated with DNA ligase. When two monomers were treated as described above, a dimer was formed which could be subsequently treated with EcoRI and BamHI, and cloned into similarly treated pCV3. Since each segment was supplied with its own stop codon, each monomer, dimer, and multimer could be expressed and analyzed individually.

EXAMPLE 2 Design and Production of an Artificial Nucleocapsid Mosaic Protein

Analysis of data derived from Genebank

The N-terminal region spanning amino acids 5-33 of the nucleocapsid (NC) protein was selected as the region to develop a mosaic protein because of the presence of several strong and broadly immunoreactive antigenic epitopes (unpublished data). All known HCV sequences deposited in Genebank were analyzed using the program FASTA (Wisconsin Package 9.0, Genetics Computer Group (GCG), Madison, Wis.). Out of 77 available sequences demonstrating variations in this region, 11 variants representing 3 different HCV genotypes were selected for designing an artificial mosaic gene. FIG. 9 represents the amino acid sequence of each of the 11 variants, referred to as fragments or monomers, comprising the mosaic NC protein.

Synthetic gene assembly

The construction of the artificial gene was accomplished using the previously described Restriction Endonuclease Assisted Ligation (REAL) method and the following synthetic oligonucleotides:

    (SEQ ID NO:1)                                                                   CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC AAA CGT AAC ACC ATT CGT CGT C;                                             (SEQ ID NO:2)                                                                   CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG AAT;                                                                       (SEQ ID NO:3)                                                                   CCC CGA ATT CAA CCG AAA CCG CAA CGT CAG                                       ACC AAA CGT AAC ACC AAC CGT CGT;                                               (SEQ ID NO:4)                                                                   CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTT G;                                                                     (SEQ ID NO:5)                                                                   CCC CCA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC AAA CGT AAC ACC TAC CGT;                                                   (SEQ ID NO:6)                                                                   CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTA GGT G;                                                                 (SEQ ID NO:7)                                                                   CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       CCG AAC CGT AAC ACC AAC CGT CGT C;                                             (SEQ ID NO:8)                                                                   CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTT;                                                                       (SEQ ID NO:9)                                                                   CCC CGA ATT CAA CCG AAA CCG CAA CGT CAG                                       CCG AAA CGT AAC ACC CCG CGT CGT CCG CAG GAC;                                   (SEQ ID NO:10)                                                                  CCC CCG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG;                                       (SEQ ID NO:11)                                                                  CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC AAA CGT AAC GCT CAC CGT CGT C;                                             (SEQ ID NO:12)                                                                  CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTG;                                                                       (SEQ ID NO:13)                                                                  CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       AAC CAG CGT AAC ACC AAC CGT CGT C;                                             (SEQ ID NO:14)                                                                  CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTT;                                                                       (SEQ ID NO:15)                                                                  CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC AAA CGT AAC ACC ATt CGT CGT C;                                             (SEQ ID NO:16)                                                                  CCC CGG ATC CTA TTT CGG AAC GTA GAT AAC                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG AAT;                                                                       (SEQ ID NO:17)                                                                  CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC GAA CGT AAC ACC AAC CGT CGT CC;                                            (SEQ ID NO:18)                                                                  CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC AGA GAA ACG AAC GTC CGG ACG ACG GT;                                (SEQ ID NO:19)                                                                  CCC CGA ATT CAA CCG AAA CCG AAA CGT CAG                                       ACC AAA CGT AAC ACC CTG CGT CGT;                                               (SEQ ID NO:20)                                                                  CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC AGC CGG GAA TTT AAC GTT TTT CGG ACG                                    ACG ACG CAG G;                                                                 (SEQ ID NO:21)                                                                  CCC CGA ATT CAA CCG AAA CCG CAA CGT AAA                                       ACC AAA CGT AAA GCT CAC CGT CGT C;                                             (SEQ ID NO:22)                                                                  CCC CGG ATC CTA TTT CGG ACC AAC GAT CTG                                       ACC ACC ACC CGG GAA TTT AAC GTC CTG CGG ACG                                    ACG GTG.                                                                  

Each pair of oligonucleotides was converted into double stranded DNA by the Klenow fragment of DNA Polymerase I and subsequently cloned, resulting in 11 monomers of 28 amino acids. Prior to sequential assembly by REAL, each fragment was amplified by the polymerase chain reaction, and determined to be the expected molecular weight by agarose gel electrophoresis with ethidium bromide staining. Additionally, the primary structure of each fragment was confirmed by sequencing. To begin assembly of the mosaic protein, two consecutive monomers were assembled into 5 dimers as shown in FIG. 10. In the next step, the remaining monomer, G, was fused with the dimer EF to form a trimer, while the other consecutive dimers were assembled into the tetramers ABCD and HIJK. Fragments EFG and HIJK were then assembled into a septamer, and the septamer was assembled into a full length, 924 base pair gene by adding the tetramer ABCD.

Protein expression and purification

Proteins were fused to the C-terminus of glutathione S-transferase by transforming competent Escherichia coli cells, JM 109 (Invitrogen, San Diego, Calif.), with plasmids containing each of the fragments. Cells were grown in LB medium containing 100 μg Ampicillin per ml in a bacteria shaker at 37° C. until the optical density at 600 nm was equal to 0.6. The tac promoter was activated to achieve protein expression by adding isopropyl-b-D-thiogalactoside (IPTG) at a final concentration of 1 mM. After 1 hour growth at 30° C., the cells were harvested, and a lysate was prepared following the procedure described by Sambrook J., et al., in Molecular Cloning--A Laboratory manual, latest edition., p. 17.38, Cold Spring Harbor Laboratory Press, New York, 1989. The glutathione S-transferase-mosaic fusion proteins were then purified by affinity chromatography using glutathione-Sepharose columns (Pharmacia, Piscataway, N.J.) (Smith D. B. and Johnson K. S., Gene 67:37-40, 1988).

Analysis of NC expressed fragments

E. coli cells were transformed with plasmid constructs containing each of the PCR amplified fragments. After induction with IPTG, crude lysates were prepared and high yields of proteins of the expected molecular mass were observed after analysis by 12% SDS-PAGE (data not shown). A comparison of different induction conditions indicated that induction with 1 mM IPTG for 1 hour at 30° C. gave the highest yield of soluble mosaic-fusion proteins (data not shown). Following the preparation of lysates, the proportion of soluble protein was estimated to be about 50-60%. Each expressed NC fragment was purified by affinity chromatography according to the manufacturer's recommendations, and analyzed by 12% SDS-PAGE and Coomassie blue staining. All 21 purified proteins demonstrated a high degree of purity and electrophoresed to their expected molecular weights. Although an artifactual doublet was present in many of the samples, this result is typical of the glutathione S-transferase (GST) expression system. The full length NC mosaic protein electrophoresed as a single band with an estimated molecular weight of 61 kDa.

Immunoblot assay

To verify the immunoreactivity of each fragment, the GST-mosaic fusion proteins were analyzed by immunoblot using an anti-HCV positive sample having high anti-NC activity by MATRIX. Nitrocellulose membranes containing immobilized proteins were incubated for 1 hour with anti-NC positive human sera diluted 1:200 times in washing solution (0.1M PBS, pH 7.2, containing 1% BSA, and 0.5% Tween 20). The membranes were washed three times with washing solution and then incubated for 1 hour with affinity-purified goat anti-human immunoglobulin G conjugated to horseradish peroxidase (Biorad, Richmond, Calif.) diluted 1:5000 in washing solution. After washing, diaminobenzidine and hydrogen peroxide were added to develop the color reaction. As shown in FIG. 11 (asterisks indicate the location of specific immunoreactivity), each of the purified proteins demonstrated immunoreactivity suggesting the accessibility of immunoreactive epitopes. The monomers were the least immunoreactive, and as the fragments increased in size they became increasingly more immunoreactive. Many of the lanes corresponding to the higher molecular weight fragments demonstrate specific reactivity to proteolytic cleavage products. Although FIG. 11 shows data for 16 of the 21 proteins, the remaining 5 proteins behaved in a similar manner.

NC Mosaic EIA

Twenty nanograms of full length affinity-purified GST-mosaic NC fusion protein in PBS (pH 7.5) was added to microtiter wells (Immunolon II: Dynatech Laboratories, Inc., Chantilly, Va.) and allowed to adsorb at room temperature for 12 hours after which the wells were blocked with 10% normal goat serum (NGS), and 1% BSA in PBS for 2 hours at 37° C. Human sera diluted 1:500 in 0.1 M phosphate-buffered saline, pH 7.5, containing 0.1% Tween 20 and 10% NGS was added and incubated for 1 hour at 37° C. After washing, goat anti-human IgG conjugated to horseradish peroxidase diluted 1:5000 in 0.1 M PBS, pH 7.5, containing 0.1% Tween 20 and 10% NGS was added, and the wells were incubated for 1 hour at 37° C. The wells were incubated for ten minutes in the dark with substrate. Acid was added to stop the reaction and optical density (OD) was measured at 490 nm.

Serum samples

Several collections of specimens were used to characterize the various fragments and to assess the NC Mosaic EIA: 1.) 128 anti-HCV positive specimens obtained from paid plasma donors (Boston Biomedica Inc., West Bridgewater, Mass.), 2.) a collection of normal blood donors negative for anti-HCV activity reposited at CDC, 3.) 21 anti-HCV positive and genotyped specimens (Boehringer Mannheim, Mannheim, Germany), and 4.) 4 anti-HCV positive seroconversion panels (Serologicals Inc., Clarkston, Ga.).

NC Mosaic EIA Results

A frequency distribution of 200 anti-HCV negative specimens were tested by EIA to statistically derive a cutoff value (FIG. 12). This value was set at an OD value greater than the mean OD plus 3.5 standard deviations of the mean or 0.145. When applying this cutoff value one of the anti-HCV negative specimens gave an OD value equal to 0.145, which was interpreted as negative, giving an overall specificity of 100%.

Two serially diluted specimens, BBI 304 and BBI 325 were tested by the NC Mosaic EIA and by MATRIX to determine relative sensitivities. The results were expressed as sample to cutoff values (S/CO) so that each test may be directly compared (FIGS. 13 and 14, respectively). A S/CO value greater than 1 is considered positive. Specimen BBI 325 reached an endpoint by MATRIX at a dilution of 1:256,000. NC Mosaic EIA gave a S/CO value of 1.8 at that dilution; however, an examination of cutoff values at a 1:64,000 dilution and at a 1:128,000 dilution suggests that the S/CO value for the EIA may not be accurate and that the true endpoint by NC Mosaic EIA may be at a dilution of 1:32,000 or 4-fold less sensitive than MATRIX. Conversely, Specimen BBI 304 gave an endpoint titer of 1:128,000 by MATRIX, while the NC Mosaic EIA was still positive at a dilution of 1:256,000 suggesting that the EIA was 2-fold more sensitive than MATRIX. It is not unusual for several samples to give different endpoint titers since the immunologic targets are very different. The endpoint titers obtained by these two assays on the same sera most probably is a reflection of the relative titers of antibodies to different antigenic epitopes as they are presented within each test format.

To measure clinical sensitivity several seroconversion panels (No. 4812, 4813, and 4814) were tested by the NC Mosaic EIA and by MATRIX (FIGS. 15, 16, and 17, respectively). A cutoff value of 2.5 times background was used for the NC Mosaic EIA, while a S/CO value greater than 1.0 was used for MATRIX. All three seroconversion panels detected anti-NC activity at approximately the same number of days after transfusion.

Another manner to measure clinical sensitivity is to test a panel of anti-HCV positive sera for anti-NC activity by NC Mosaic EIA and by MATRIX. A panel of 128 specimens obtained from professional plasma donors tested positive by a commercially available EIA screening assay. Among the 128 initially reactive specimens, 109 were confirmed as positive by MATRIX, while 12 tested as indeterminate and 7 as negative. Among the 109 confirmed anti-HCV positive specimens, 101 (92.6%) demonstrated anti-NC activity by MATRIX and 99 (90.8%) by NC Mosaic EIA suggesting a slightly higher sensitivity for MATRIX. Among the 12 indeterminate specimens, 6 demonstrated anti-NC activity by MATRIX, and 3 by NC Mosaic EIA suggesting a higher specificity for the NC Mosaic EIA. None of the 7 anti-HCV negatives were positive for anti-NC activity by either test. (Data not shown).

In another study, among 78 initially reactive specimens 66 were confirmed as anti-HCV positive by MATRIX, one specimen tested indeterminate, while 3 tested as negative. The NC Mosaic EIA gave concordant results with MATRIX for anti-NC activity for the 66 positive samples and for the one negative specimen. The indeterminate specimen tested negative for anti-NC activity by NC Mosaic EIA suggesting a higher specificity for this specimen. The remaining 8 specimens were known to have nonspecific reactivity to the NS4 antigen, but tested negative by both assays for anti-NC activity. (Data not shown).

Finally, 23 anti-HCV sera representing genotypes 1-5 were tested for anti-NC activity by NC Mosaic EIA and by MATRIX. The results indicating a 100% concordance between the two assays (data not shown) indicating that the mosaic NC protein, although composed of sequences from genotypes 1-3, contains crossreacting epitopes that react with anti-NC positive sera obtained from individuals infected with 5 different genotypes. Collectively, these results suggest that the NC mosaic protein when used as the immunologic target in an EIA format is at least as sensitive and possibly more specific than MATRIX for the detection of anti-NC activity.

EXAMPLE 3 Design and Production of an Artificial NS4 Mosaic Protein

Design of an artificial NS4 mosaic protein

To construct an artificial NS4 antigen containing antigenic epitopes from several HCV genotypes, all sequences from the 5-1-1 region as well as a strongly immunoreactive region located at the C-terminal of NS4 were searched in GeneBank. Representative regions from different genotypes were selected based upon significant sequence divergence from each other and are shown in FIG. 18.

Gene Assembly

The amino acid sequence as shown in FIG. 18 was back translated into its nucleotide sequence, and synthetic oligonucleotides were used to construct nine monomers, the first eight consisting of two antigenic domains and the ninth consisting of one antigenic domain, and were consecutively assembled using Restriction Endonuclease Assisted Ligation (REAL). The synthetic oligonucleotides used were as follows:

    (SEQ ID NO:34)                                                                  CCC CGA ATT CAA GCC GCC CAC ATA CCA TAC                                       CTA GAA CAA GGA ATG CAT CTC GCA GAA CAA TTC                                    AAA CAA AAG GCA CTT CGT C;                                                     (SEQ ID NO:35)                                                                  CCC CGG ATC CTA ACT AGC CTC TTC CAT CTC                                       ATC AAA CTC CTG ATA CAA AAC CTC CCT ATC CGG                                    GAT AAC AGC CGG ACG AAG TGC;                                                   (SEQ ID NO:36)                                                                  CCC CGA ATT CAA GCT AGT CAC TTA CCG TAT                                       ATC GAG CAG GGA ATG CAG TTA GCT GAA CAG TTT                                    AAA CAG AAG GCT CTG GCT TTT G;                                                 (SEQ ID NO:37)                                                                  CCC CGG ATC CTA AGG CCG AGC GTC AGA CTC                                       AGG AAC ATA ATG AGT AGG AGA AAC ATG ATT ACC                                    CCG AGA AGC AAA AGC CAG;                                                       (SEQ ID NO:38)                                                                  CCC CGA ATT CAA CGG CCT GCG ATA ATA CCG                                       GAT AGG GAG GTT CTT CAT AGG GAG TTT GAC GAG                                    ATG GAG GAG GCT TTT GCG;                                                       (SEQ ID NO:39)                                                                  CCC CGG ATC CTA CTG CGA AGC ATC AGA CTC                                       AGG AAC ATA ATG AGC CGG ACT AAC ATG ATT CCC                                    ACG AGA CGC AAA AGC C;                                                         (SEQ ID NO:40)                                                                  CCC CGA ATT CAA TCG CAG GCG GCG CCT TAT                                       ATT GAG CAG GCT CAG GTT ATT GCT CAT CAG TTT                                    AAG GAG AAG GTT CTT GCT TT;                                                    (SEQ ID NO:41)                                                                  CCC CGG ATC CTA CGG CTT CGC GTC CGA CTC                                       AGG AAC ATA ATG AGT CGG AGA ATC ATG ATT ACC                                    ACG AGA AGC AAA AGC AAG AA;                                                    (SEQ ID NO:42)                                                                  CCC CGA ATT CAA AAG CCG GCG ATA ATC CCT                                       GAC CGT GAG GTT CTG TAT CGT GAG TTT GAT GAG                                    ATG GAG GAG TCA CAG C;                                                         (SEQ ID NO:43)                                                                  CCC CGG ATC CTA AAA CGC CAG AGC CTT CTG                                       CTT AAA CTG CTC AGC AAG CAT CAT ACC CTG CTC AAT                                GTA CGG AAG ATG CTG TGA CTC;                                                   (SEQ ID NO:44)                                                                  CCC CGA ATT CAA GCG TTT GCT TCT CGT GGT                                       AAT CAT GTT GCT CCG ACT CAT TAT GTT ACG GAG                                    TCA GAT GCT AAG C;                                                             (SEQ ID NO:45)                                                                  CCC CGG ATC CTA GAA AGC CTC CTC CAT CTC                                       ATC ATA CTG CTG ATA AAG AAC CTC CTT ATC CGG                                    AAC CAG AGC CGG CTT AGC ATC;                                                   (SEQ ID NO:46)                                                                  CCC CGA ATT CAA GCT TTC GCT TCT CGT GGT                                       AAT CAT GTT GCT CCT ACG CAT TAT GTT GTT GAG TCA                                GAT GCT TCT GCT TC;                                                            (SEQ ID NO:47)                                                                  CCC CGG ATC CTA GAA AGC CAG AAC CTT CTC                                       CTT AAA CTG ACC AGC AAT AGC ACG CGT CTC GTC                                    CAT ATA CGG CAG AGA AGC AGA AG;                                                (SEQ ID NO:48)                                                                  CCC CGA ATT CAA GCT TTC GCT AGT CGT GGG                                       AAT CAT GTG TCG CCG CGT CAT TAT GTG CCT GAG TCT                                GAG CCT CAG GTT GT;                                                            (SEQ ID NO:49)                                                                  CCC CGG ATC CTA AGA AGC CTC CTC CAT CTC                                       ATC AAA AGC CTC ATA CAG TAT CTC CTT ATC CGG CGT                                AAC AAC AAC CTG AG;                                                            (SEQ ID NO:50)                                                                  CCC CGA ATT CAA GCT TCT AAG GCC GCG CTG                                       ATT GAG GAG GGT CAG CGT ATG G;                                                 (SEQ ID NO:51)                                                                  CCC CGG ATC CTA CTG GAT CTT AGA CTT CAG                                       CAT CTC AGC CAT ACG CTG;                                                  

Gene expression and Protein Purification

To express the synthetic genes, E. coli JM109 competent cells (Promega, Madison, Wis.) were transformed with plasmids containing 9 monomers, 4 dimers, 2 tetramers, and a full size gene using the REAL method described in Example 1. Cells were grown in Luria broth (LB) with 50 mg/ml ampicillin overnight at 37° C. The cultures were then diluted 1:10 in fresh LB with 50 mg/ml ampicillin and grown 3 to 4 hours until the optical density at 600 nm reached 0.5-1.0. The gene was expressed by activating the tac promoter by the addition of isopropyl-b-D-thiogalactoside (IPTG, Sigma Chemical Co., St. Louis, Mo.) at a final concentration of 1 mM. Cells were harvested 30 minutes after induction at 37° C. Cell lysates were prepared and the soluble fraction of the lysates was obtained by centrifugation at 12,000× g for 20 minutes. The glutathione S-transferase (GST)-mosaic proteins were purified by affinity chromatography using glutathione sepharose 4B column (Pharmacia Biotech, Piscataway, N.J.).

Immunoblot Assay

Aliquots of each lysate or aliquots of the purified GST-mosaic proteins were subjected to electrophoresis on 12% polyacrylamide gels containing SDS (SDS-PAGE) followed by blotting onto a nitrocellulose membrane. Following protein transfer, the nitrocellulose membranes were incubated with blocking solution (0.1 M phosphate-buffered saline containing 1% bovine serum albumin, 0.5% Tween 20, and 10% normal goat serum) overnight at 4° C., and then incubated with human HCV positive sera diluted 1:100 or 1:200 in blocking solution for 1 hour at room temperature. For immunodetection, the membranes were washed three times with blocking solution, followed by the addition of affinity-purified goat anti-human immunoglobulin G (IgG) conjugated to horseradish peroxidase (Bio-Rad, Richmond, Calif.) diluted 1:4000 or 1:6000 in blocking solution, and incubated 1 hour at room temperature. After washing the membranes with blocking solution three times, diaminobenzidine (Sigma Chemical Co., St. Louis, Mo.) and hydrogen peroxidase were added to detect the presence of the horseradish peroxidase (HRP) reporter molecule.

Enzyme Immunoassay (EIA)

One hundred microliters of the purified full length fusion NS4 mosaic protein (GST-W3) was adsorbed to microtiter wells (Immuno II; Dynatech Laboratories, Inc., Chantilly, Va.) at a concentration of 100 ng/ml in 0.1 M phosphate-buffered saline, pH 7.5, overnight at room temperature. The microtiter wells were then incubated with human anti-HCV negative or positive sera diluted 1:500 in blocking solution (as described above for the immunoblot assay) for 1 hour at 37° C. After washing the microtiter wells, goat anti-human immunoglobulin G (IgG) conjugated to HRP diluted 1:4000 was added and incubated for 1 hour at 37° C. After washing the microtiter wells 5 times, substrate and chromophore was added (Abbott Diagnostics Division, North Chicago, Ill.) and incubated in the dark for 30 minutes. The reaction was stopped with acid and the optical density was measured at 493 nm.

Human sera

Anti-HCV positive sera were obtained from Boehringer Mannheim Inc. (Penzberg, Germany) and from Boston Biomedical Inc. (West Bridgewater, Mass.). Anti-HCV negative sera were obtained from a collection of normal human blood donors reposited at the Centers for Disease Control and Prevention (CDC, Atlanta, Ga.). All sera were confirmed as anti-HCV positive or negative by EIA and initially reactive specimens were confirmed and further characterized by the supplemental test MATRIX (Abbott Laboratories, Abbott Park, Ill.).

Gene Assembly

The full length artificial NS4 mosaic gene was constructed in sequential steps from synthetic oligonucleotides by REAL. As shown in FIG. 19, each pair of oligonucleotides were converted into 9 monomers (A, B, C, D, E, F, G, H, and I), which were then consecutively assembled into 4 dimers (AB, CD, EF, and GH). Consecutive dimers were then assembled into 2 tetramers (ABCD and EFGH). The final gene was assembled from the 2 tetramers and the remaining monomer (I).

Gene Expression and Immunoblot Assay

SDS-PAGE analysis demonstrated that each fragment (9 monomers, 4 dimers, 2 tetramers, and the full length gene) was expressed at high levels following induction for 5 hours at 37° C. with 1 mM IPTG. Each of the expressed fragments and the expressed full length gene were purified by ligand affinity chromatography. All of the purified proteins were shown to be highly purified by SDS-PAGE, although many of the purified proteins displayed an artifactual doublet. In addition, each of the purified proteins were analyzed by immunoblot (FIG. 20) to ascertain individual immunoreactivity to human anti-HCV positive sera. The immunoblot showed that most of the purified proteins were strongly immunoreactive with a single anti-HCV positive specimen diluted 1:200. Three monomers (A, H, and I), however, were not immunoreactive using this specimen diluted 1:200. Monomers A and H were immunoreactive using pooled sera diluted 1:10 indicating that these monomers were immunoreactive. Monomer I demonstrated weak immunoreactivity by EIA.

NS4 Mosaic EIA Frequency Distribution

A statistically valid cutoff value was determined by screening 160 anti-HCV negative sera and 166 anti-HCV positive sera (anti-NS4 positive by MATRIX) by EIA. The results showed that approximately 90% of anti-HCV negative sera gave OD values less than 0.09, while approximately 80% of anti-HCV positive sera gave OD values greater than 2.1. The mean OD value for the anti-HCV negative specimens was 0.0518±0.0273 standard deviations (SD). The cutoff value was established as the mean of OD values for anti-HCV negative sera plus 3.5 times the SD of the mean. This cutoff value unambiguously separated the negative sera from the positive sera (FIG. 21), although one negative specimen gave an OD value slightly above this cutoff. Using this cutoff value, all of the anti-HCV positive specimens tested positive by the NS4 Mosaic EIA. A two by two analysis of the data revealed a sensitivity of 100% and a specificity of 99.4% using this derived cutoff value. By raising the cutoff to the mean +4.3 times the SD, the specificity compared to MATRIX was 100%.

NS4 Mosaic EIA Compared to MATRIX on Serially Diluted Anti-HCV Positive Sera

To examine the antigenic reactivity of the NS4 mosaic protein in detecting anti-NS4 activity, two serially diluted anti-NS4 positive sera were tested by the NS4 Mosaic EIA and by MATRIX. The results showed that anti-NS4 antibody can be detected by the NS4 Mosaic EIA at a dilution of 1:128,000 times, while MATRIX was positive for anti-NS4 activity at a dilution of approximately 1:4000. MATRIX utilizes two different NS4 proteins expressed in E. coli and in yeast. This comparison indicated that the antigenic reactivity to the NS4 mosaic protein was 32 times more sensitive than MATRIX for specimen no. 1 (FIG. 22A) and 18 to 25 times more sensitive for specimen no. 2 (FIG. 22B).

NS4 Mosaic EIA Compared to MATRIX for the Detection of Anti-HCV

Among 182 anti-HCV positive sera, 97.8% tested positive for anti-NS4 activity by the NS4 Mosaic EIA compared to 86.8% by MATRIX. These results strongly suggest that the mosaic protein is a more sensitive immunologic target than either of the NS4 antigens used by MATRIX. Antibody activity to the NS3 and nucleocapsid (NC) antigens by MATRIX were also compared to the mosaic protein for anti-NS4 activity. This analysis showed that 98.4% of the 182 sera tested positive for anti-NS3 and 94.5% for anti-NC indicating that the NS4 Mosaic EIA is more sensitive than MATRIX for anti-NC activity, and almost as sensitive as MATRIX for anti-NS3 activity (FIG. 23).

NS4 Mosaic EIA Compared to MATRIX for Seroconversion Panels

Ten seroconversion panels (BioClinical Partners, Inc.; Serologicals, Chamblee, Ga.) were tested by the NS4 Mosaic EIA and by MATRIX to determine the temporal appearance of anti-NS4 activity in recently infected individuals. The results showed that the NS4 Mosaic EIA detected anti-NS4 activity approximately 15 (FIG. 24) to 25 days (FIG. 24) earlier than MATRIX when a cutoff value of at least 2.5 times background was used. In some cases, the NS4 Mosaic EIA and MATRIX gave similar results; however, MATRIX results never demonstrated earlier detection of anti-NS4 activity than NS4 Mosaic EIA results (data not shown). These results indicate that the NS4 mosaic protein, when used as the immunologic target in an EIA, was at least as sensitive as MATRIX for the early detection of anti-NS4 activity, and probably more sensitive if more frequent bleed dates were available for each of the ten seroconversion panels.

NS4 Mosaic EIA Reactivity to Different HCV Genotypes

Since the NS4 mosaic protein is composed of antigenic regions derived from several HCV subtypes and genotypes, it should detect anti-NS4 activity in the sera from patients infected with different genotypes. Genotypes 1-5 were tested for immunoreactivity by the NS4 Mosaic EIA. The results indicated that the only specimens which did not react to the mosaic protein were those that tested negative for anti-NS4 activity by MATRIX. These data indicate that the mosaic protein detected anti-NS4 activity in each of the genotypes tested and was 100% concordant with MATRIX (FIG. 25).

    __________________________________________________________________________     #             SEQUENCE LISTING                                                 - (1) GENERAL INFORMATION:                                                     -    (iii) NUMBER OF SEQUENCES: 55                                             - (2) INFORMATION FOR SEQ ID NO:1:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                  - CCCCGAATTC AACCGAAACC GCAACGTAAA ACCAAACGTA ACACCATTCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:2:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                  - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:3:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 54 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                  - CCCCGAATTC AACCGAAACC GCAACGTCAG ACCAAACGTA ACACCAACCG TC - #GT                54                                                                           - (2) INFORMATION FOR SEQ ID NO:4:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 70 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                  - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #        70                                                                    - (2) INFORMATION FOR SEQ ID NO:5:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 51 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                  #             51CGAAACC GCAACGTAAA ACCAAACGTA ACACCTACCG T                     - (2) INFORMATION FOR SEQ ID NO:6:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 73 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                  - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #      73                                                                      - (2) INFORMATION FOR SEQ ID NO:7:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                  - CCCCGAATTC AACCGAAACC GCAACGTAAA CCGAACCGTA ACACCAACCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:8:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                  - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:9:                                             -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 63 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                                  - CCCCGAATTC AACCGAAACC GCAACGTCAG CCGAAACGTA ACACCCCGCG TC - #GTCCGCAG          60                                                                           #             63                                                               - (2) INFORMATION FOR SEQ ID NO:10:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 60 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           - (2) INFORMATION FOR SEQ ID NO:11:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                                 - CCCCGAATTC AACCGAAACC GCAACGTAAA ACCAAACGTA ACGCTCACCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:12:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:13:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                                 - CCCCGAATTC AACCGAAACC GCAACGTAAA AACCAGCGTA ACACCAACCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:14:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:15:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                                 - CCCCGAATTC AACCGAAACC GCAACGTAAA ACCAAACGTA ACACCATTCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:16:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                                 - CCCCGGATCC TATTTCGGAA CGTAGATAAC ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:17:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 56 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                                 - CCCCGAATTC AACCGAAACC GCAACGTAAA ACCGAACGTA ACACCAACCG TC - #GTCC              56                                                                           - (2) INFORMATION FOR SEQ ID NO:18:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 65 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCA GAGAAACGAA CG - #TCCGGACG          60                                                                           #            65                                                                - (2) INFORMATION FOR SEQ ID NO:19:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 54 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:                                 - CCCCGAATTC AACCGAAACC GAAACGTCAG ACCAAACGTA ACACCCTGCG TC - #GT                54                                                                           - (2) INFORMATION FOR SEQ ID NO:20:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 73 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCAGCC GGGAATTTAA CG - #TTTTTCGG          60                                                                           #      73                                                                      - (2) INFORMATION FOR SEQ ID NO:21:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 55 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:                                 - CCCCGAATTC AACCGAAACC GCAACGTAAA ACCAAACGTA AAGCTCACCG TC - #GTC               55                                                                           - (2) INFORMATION FOR SEQ ID NO:22:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 69 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:                                 - CCCCGGATCC TATTTCGGAC CAACGATCTG ACCACCACCC GGGAATTTAA CG - #TCCTGCGG          60                                                                           #         69                                                                   - (2) INFORMATION FOR SEQ ID NO:23:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:                                 - Pro Lys Pro Gln Arg Lys Thr Lys Arg Asn Th - #r Ile Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:24:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:                                 - Pro Lys Pro Gln Arg Gln Thr Lys Arg Asn Th - #r Asn Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:25:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:                                 - Pro Lys Pro Gln Arg Lys Thr Lys Arg Asn Th - #r Tyr Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:26:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26:                                 - Pro Lys Pro Gln Arg Lys Pro Asn Arg Asn Th - #r Asn Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:27:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27:                                 - Pro Lys Pro Gln Arg Gln Pro Lys Arg Asn Th - #r Pro Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:28:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:                                 - Pro Lys Pro Gln Arg Lys Thr Lys Arg Asn Al - #a His Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:29:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29:                                 - Pro Lys Pro Gln Lys Arg Asn Gln Arg Asn Th - #r Asn Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:30:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:30:                                 - Pro Lys Pro Gln Arg Lys Thr Lys Arg Asn Th - #r Ile Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Val Ile Ty - #r Val                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:31:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31:                                 - Pro Lys Pro Gln Arg Lys Thr Glu Arg Asn Th - #r Asn Arg Arg Pro Gln          #                15                                                            - Asp Val Arg Phe Ser Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:32:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32:                                 - Pro Lys Pro Lys Arg Gln Thr Lys Arg Asn Th - #r Leu Arg Arg Pro Lys          #                15                                                            - Asn Val Lys Phe Pro Ala Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:33:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 28 amino                                                           (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33:                                 - Pro Lys Pro Gln Arg Lys Thr Lys Arg Lys Al - #a His Arg Arg Pro Gln          #                15                                                            - Asp Val Lys Phe Pro Gly Gly Gly Gln Ile Va - #l Gly                          #            25                                                                - (2) INFORMATION FOR SEQ ID NO:34:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 82 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34:                                 - CCCCGAATTC AAGCCGCCCA CATACCATAC CTAGAACAAG GAATGCATCT CG - #CAGAACAA          60                                                                           #                 82TCG TC                                                     - (2) INFORMATION FOR SEQ ID NO:35:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 84 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:35:                                 - CCCCGGATCC TAACTAGCCT CTTCCATCTC ATCAAACTCC TGATACAAAA CC - #TCCCTATC          60                                                                           #                84CGAA GTGC                                                   - (2) INFORMATION FOR SEQ ID NO:36:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 85 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36:                                 - CCCCGAATTC AAGCTAGTCA CTTACCGTAT ATCGAGCAGG GAATGCAGTT AG - #CTGAACAG          60                                                                           #               85 TGGC TTTTG                                                  - (2) INFORMATION FOR SEQ ID NO:37:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 81 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37:                                 - CCCCGGATCC TAAGGCCGAG CGTCAGACTC AGGAACATAA TGAGTAGGAG AA - #ACATGATT          60                                                                           #81                GCCA G                                                      - (2) INFORMATION FOR SEQ ID NO:38:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 81 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38:                                 - CCCCGAATTC AACGGCCTGC GATAATACCG GATAGGGAGG TTCTTCATAG GG - #AGTTTGAC          60                                                                           #81                TTGC G                                                      - (2) INFORMATION FOR SEQ ID NO:39:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 79 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39:                                 - CCCCGGATCC TACTGCGAAG CATCAGACTC AGGAACATAA TGAGCCGGAC TA - #ACATGATT          60                                                                           # 79               GCC                                                         - (2) INFORMATION FOR SEQ ID NO:40:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 83 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:40:                                 - CCCCGAATTC AATCGCAGGC GGCGCCTTAT ATTGAGCAGG CTCAGGTTAT TG - #CTCATCAG          60                                                                           #                83TTGC TTT                                                    - (2) INFORMATION FOR SEQ ID NO:41:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 83 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41:                                 - CCCCGGATCC TACGGCTTCG CGTCCGACTC AGGAACATAA TGAGTCGGAG AA - #TCATGATT          60                                                                           #                83GCAA GAA                                                    - (2) INFORMATION FOR SEQ ID NO:42:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 79 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42:                                 - CCCCGAATTC AAAAGCCGGC GATAATCCCT GACCGTGAGG TTCTGTATCG TG - #AGTTTGAT          60                                                                           # 79               AGC                                                         - (2) INFORMATION FOR SEQ ID NO:43:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 87 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43:                                 - CCCCGGATCC TAAAACGCCA GAGCCTTCTG CTTAAACTGC TCAGCAAGCA TC - #ATACCCTG          60                                                                           #             87   TGCT GTGACTC                                                - (2) INFORMATION FOR SEQ ID NO:44:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 76 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44:                                 - CCCCGAATTC AAGCGTTTGC TTCTCGTGGT AATCATGTTG CTCCGACTCA TT - #ATGTTACG          60                                                                           #    76                                                                        - (2) INFORMATION FOR SEQ ID NO:45:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 84 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:45:                                 - CCCCGGATCC TAGAAAGCCT CCTCCATCTC ATCATACTGC TGATAAAGAA CC - #TCCTTATC          60                                                                           #                84TTAG CATC                                                   - (2) INFORMATION FOR SEQ ID NO:46:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 80 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:46:                                 - CCCCGAATTC AAGCTTTCGC TTCTCGTGGT AATCATGTTG CTCCTACGCA TT - #ATGTTGTT          60                                                                           # 80               CTTC                                                        - (2) INFORMATION FOR SEQ ID NO:47:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 86 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:47:                                 - CCCCGGATCC TAGAAAGCCA GAACCTTCTC CTTAAACTGA CCAGCAATAG CA - #CGCGTCTC          60                                                                           #              86  GAAG CAGAAG                                                 - (2) INFORMATION FOR SEQ ID NO:48:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 80 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:48:                                 - CCCCGAATTC AAGCTTTCGC TAGTCGTGGG AATCATGTGT CGCCGCGTCA TT - #ATGTGCCT          60                                                                           # 80               TTGT                                                        - (2) INFORMATION FOR SEQ ID NO:49:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 80 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:49:                                 - CCCCGGATCC TAAGAAGCCT CCTCCATCTC ATCAAAAGCC TCATACAGTA TC - #TCCTTATC          60                                                                           # 80               TGAG                                                        - (2) INFORMATION FOR SEQ ID NO:50:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 52 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:50:                                 - CCCCGAATTC AAGCTTCTAA GGCCGCGCTG ATTGAGGAGG GTCAGCGTAT GG - #                  52                                                                           - (2) INFORMATION FOR SEQ ID NO:51:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 48 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: single                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:51:                                 #                48ATCT TAGACTTCAG CATCTCAGCC ATACGCTG                         - (2) INFORMATION FOR SEQ ID NO:52:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #acids    (A) LENGTH: 352 amino                                                          (B) TYPE: amino acid                                                           (C) STRANDEDNESS: Not R - #elevant                                             (D) TOPOLOGY: Not Relev - #ant                                       -     (ii) MOLECULE TYPE: protein                                              -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:52:                                 - Ala Ala His Ile Pro Tyr Leu Glu Gln Gly Me - #t His Leu Ala Glu Gln          #                15                                                            - Phe Lys Gln Lys Ala Leu Arg Pro Ala Val Il - #e Pro Asp Arg Glu Val          #            30                                                                - Leu Tyr Gln Glu Phe Asp Glu Met Glu Glu Al - #a Ser His Leu Pro Tyr          #        45                                                                    - Ile Glu Gln Gly Met Gln Leu Ala Glu Gln Ph - #e Lys Gln Lys Ala Leu          #    60                                                                        - Ala Phe Ala Ser Arg Gln Asn His Val Ser Pr - #o Thr His Tyr Val Pro          #80                                                                            - Glu Ser Asp Ala Arg Pro Ala Ile Ile Pro As - #p Arg Glu Val Leu His          #                95                                                            - Arg Glu Phe Asp Glu Met Glu Glu Ala Phe Al - #a Ser Arg Gly Asn His          #           110                                                                - Val Ser Pro Ala His Tyr Val Pro Glu Ser As - #p Ala Ser Gln Ala Ala          #       125                                                                    - Pro Tyr Ile Glu Gln Ala Gln Val Ile Ala Hi - #s Gln Phe Lys Glu Lys          #   140                                                                        - Val Leu Ala Phe Ala Ser Arg Gly Asn His As - #p Ser Pro Thr His Tyr          145                 1 - #50                 1 - #55                 1 -        #60                                                                            - Val Pro Glu Ser Asp Ala Lys Pro Ala Ile Il - #e Pro Asp Arg Glu Val          #               175                                                            - Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Se - #r Gln His Leu Pro Tyr          #           190                                                                - Ile Glu Gln Gly Met Met Leu Ala Glu Gln Ph - #e Lys Gln Lys Ala Leu          #       205                                                                    - Ala Phe Ala Ser Arg Gly Asn His Val Ala Pr - #o Thr His Tyr Val Thr          #   220                                                                        - Glu Ser Asp Ala Lys Pro Ala Leu Val Pro As - #p Lys Glu Val Leu Tyr          225                 2 - #30                 2 - #35                 2 -        #40                                                                            - Gln Gln Tyr Asp Glu Met Glu Glu Ala Phe Al - #a Ser Arg Gly Asn His          #               255                                                            - Val Ala Pro Thr His Tyr Val Val Glu Ser As - #p Ala Ser Ala Ser Leu          #           270                                                                - Pro Tyr Met Asp Glu Thr Arg Ala Ile Ala Gl - #y Gln Phe Lys Glu Lys          #       285                                                                    - Val Leu Ala Phe Ala Ser Arg Gly Asn His Va - #l Ser Pro Arg His Tyr          #   300                                                                        - Val Pro Glu Ser Glu Pro Gln Val Val Val Th - #r Pro Asp Lys Glu Ile          305                 3 - #10                 3 - #15                 3 -        #20                                                                            - Leu Tyr Glu Ala Phe Asp Glu Met Glu Glu Al - #a Ser Lys Ala Ala Leu          #               335                                                            - Ile Glu Glu Gly Gln Arg Met Ala Glu Met Le - #u Lys Ser Lys Ile Gln          #           350                                                                - (2) INFORMATION FOR SEQ ID NO:53:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 57 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: double                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:53:                                 - CTGGTTCCGC GTGGATCCCC AGGAATTCCC GGGTCGACTC GAGCGGCCGC AT - #CGTGA             57                                                                           - (2) INFORMATION FOR SEQ ID NO:54:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 27 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: double                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:54:                                 #             27   AGGA TCCATCC                                                - (2) INFORMATION FOR SEQ ID NO:55:                                            -      (i) SEQUENCE CHARACTERISTICS:                                           #pairs    (A) LENGTH: 56 base                                                            (B) TYPE: nucleic acid                                                         (C) STRANDEDNESS: double                                                       (D) TOPOLOGY: linear                                                 -     (ii) MOLECULE TYPE: cDNA                                                 -    (iii) HYPOTHETICAL: NO                                                    -     (iv) ANTI-SENSE: NO                                                      -      (v) FRAGMENT TYPE: internal                                             -     (vi) ORIGINAL SOURCE:                                                    #virus    (A) ORGANISM: Hepatitis                                              -     (xi) SEQUENCE DESCRIPTION: SEQ ID NO:55:                                 - CTGGTTCCGC GTGGATCGCA GCGAATTCTC GAGGATCCAT CCCGGCCGCA TC - #GTGA              56                                                                           __________________________________________________________________________ 

We claim:
 1. A mosaic protein comprising at least three homologous antigenic peptides from different genotypes or subtypes of a species, wherein the antigenic peptides are selected from the amino acid sequences set forth in SEQ ID NOs:23-33.
 2. A method of detecting a hepatitis infection in an individual comprising combining a serum sample from the individual with the mosaic protein of claim 1, and detecting the presence of antibody binding to the mosaic protein, the presence of binding indicating a hepatitis infection in the individual.
 3. The method of claim 2, wherein the mosaic protein comprises the amino acid sequences set forth in SEQ ID NOs:23-33.
 4. The method of claim 2, wherein the mosaic protein comprises the amino acid sequences set forth in SEQ ID NOs:23-33, in numerical order.
 5. The mosaic protein of claim 1, comprising the amino acid sequences set forth in SEQ ID NOs:23-33.
 6. The mosaic protein of claim 1, comprising the amino acid sequences set forth in SEQ ID NOs:23-33, in numerical order.
 7. A mosaic protein, comprising at least three homologous antigenic peptides from different genotypes or subtypes of a species, wherein the mosaic protein has an amino acid sequence set forth in SEQ ID NO:52.
 8. A method of detecting a hepatitis infection in an individual comprising combining a serum sample from the individual with the mosaic protein of claim 7, and detecting the presence of antibody binding to the mosaic protein, the presence of binding indicating a hepatitis infection in the individual. 